92 research outputs found
Sharing Semantic Resources
The Semantic Web is an extension of the current Web in which information, so far created for human consumption, becomes machine readable, “enabling computers and people to work in cooperation”. To turn into reality this vision several challenges are still open among which the most important is to share meaning formally represented with ontologies or more generally with semantic resources. This Semantic Web long-term goal has many convergences with the activities in the field of Human Language Technology and in particular in the development of Natural Language Processing applications where there is a great need of multilingual lexical resources. For instance, one of the most important lexical resources, WordNet, is also commonly regarded and used as an ontology. Nowadays, another important phenomenon is represented by the explosion of social collaboration, and Wikipedia, the largest encyclopedia in the world, is object of research as an up to date omni comprehensive semantic resource. The main topic of this thesis is the management and exploitation of semantic resources in a collaborative way, trying to use the already available resources as Wikipedia and Wordnet. This work presents a general environment able to turn into reality the vision of shared and distributed semantic resources and describes a distributed three-layer architecture to enable a rapid prototyping of cooperative applications for developing semantic resources
Cashtag piggybacking: uncovering spam and bot activity in stock microblogs on Twitter
Microblogs are increasingly exploited for predicting prices and traded
volumes of stocks in financial markets. However, it has been demonstrated that
much of the content shared in microblogging platforms is created and publicized
by bots and spammers. Yet, the presence (or lack thereof) and the impact of
fake stock microblogs has never systematically been investigated before. Here,
we study 9M tweets related to stocks of the 5 main financial markets in the US.
By comparing tweets with financial data from Google Finance, we highlight
important characteristics of Twitter stock microblogs. More importantly, we
uncover a malicious practice - referred to as cashtag piggybacking -
perpetrated by coordinated groups of bots and likely aimed at promoting
low-value stocks by exploiting the popularity of high-value ones. Among the
findings of our study is that as much as 71% of the authors of suspicious
financial tweets are classified as bots by a state-of-the-art spambot detection
algorithm. Furthermore, 37% of them were suspended by Twitter a few months
after our investigation. Our results call for the adoption of spam and bot
detection techniques in all studies and applications that exploit
user-generated content for predicting the stock market
Social Fingerprinting: detection of spambot groups through DNA-inspired behavioral modeling
Spambot detection in online social networks is a long-lasting challenge
involving the study and design of detection techniques capable of efficiently
identifying ever-evolving spammers. Recently, a new wave of social spambots has
emerged, with advanced human-like characteristics that allow them to go
undetected even by current state-of-the-art algorithms. In this paper, we show
that efficient spambots detection can be achieved via an in-depth analysis of
their collective behaviors exploiting the digital DNA technique for modeling
the behaviors of social network users. Inspired by its biological counterpart,
in the digital DNA representation the behavioral lifetime of a digital account
is encoded in a sequence of characters. Then, we define a similarity measure
for such digital DNA sequences. We build upon digital DNA and the similarity
between groups of users to characterize both genuine accounts and spambots.
Leveraging such characterization, we design the Social Fingerprinting
technique, which is able to discriminate among spambots and genuine accounts in
both a supervised and an unsupervised fashion. We finally evaluate the
effectiveness of Social Fingerprinting and we compare it with three
state-of-the-art detection algorithms. Among the peculiarities of our approach
is the possibility to apply off-the-shelf DNA analysis techniques to study
online users behaviors and to efficiently rely on a limited number of
lightweight account characteristics
The Anatomy of Conspirators: Unveiling Traits using a Comprehensive Twitter Dataset
The discourse around conspiracy theories is currently thriving amidst the
rampant misinformation prevalent in online environments. Research in this field
has been focused on detecting conspiracy theories on social media, often
relying on limited datasets. In this study, we present a novel methodology for
constructing a Twitter dataset that encompasses accounts engaged in
conspiracy-related activities throughout the year 2022. Our approach centers on
data collection that is independent of specific conspiracy theories and
information operations. Additionally, our dataset includes a control group
comprising randomly selected users who can be fairly compared to the
individuals involved in conspiracy activities. This comprehensive collection
effort yielded a total of 15K accounts and 37M tweets extracted from their
timelines. We conduct a comparative analysis of the two groups across three
dimensions: topics, profiles, and behavioral characteristics. The results
indicate that conspiracy and control users exhibit similarity in terms of their
profile metadata characteristics. However, they diverge significantly in terms
of behavior and activity, particularly regarding the discussed topics, the
terminology used, and their stance on trending subjects. Interestingly, there
is no significant disparity in the presence of bot users between the two
groups, suggesting that conspiracy and automation are orthogonal concepts.
Finally, we develop a classifier to identify conspiracy users using 93
features, some of which are commonly employed in literature for troll
identification. The results demonstrate a high accuracy level (with an average
F1 score of 0.98%), enabling us to uncover the most discriminative features
associated with conspiracy-related accounts
Modularity-based approach for tracking communities in dynamic social networks
Community detection is a crucial task to unravel the intricate dynamics of
online social networks. The emergence of these networks has dramatically
increased the volume and speed of interactions among users, presenting
researchers with unprecedented opportunities to explore and analyze the
underlying structure of social communities. Despite a growing interest in
tracking the evolution of groups of users in real-world social networks, the
predominant focus of community detection efforts has been on communities within
static networks. In this paper, we introduce a novel framework for tracking
communities over time in a dynamic network, where a series of significant
events is identified for each community. Our framework adopts a
modularity-based strategy and does not require a predefined threshold, leading
to a more accurate and robust tracking of dynamic communities. We validated the
efficacy of our framework through extensive experiments on synthetic networks
featuring embedded events. The results indicate that our framework can
outperform the state-of-the-art methods. Furthermore, we utilized the proposed
approach on a Twitter network comprising over 60,000 users and 5 million tweets
throughout 2020, showcasing its potential in identifying dynamic communities in
real-world scenarios. The proposed framework can be applied to different social
networks and provides a valuable tool to gain deeper insights into the
evolution of communities in dynamic social networks
The paradigm-shift of social spambots: Evidence, theories, and tools for the arms race
Recent studies in social media spam and automation provide anecdotal
argumentation of the rise of a new generation of spambots, so-called social
spambots. Here, for the first time, we extensively study this novel phenomenon
on Twitter and we provide quantitative evidence that a paradigm-shift exists in
spambot design. First, we measure current Twitter's capabilities of detecting
the new social spambots. Later, we assess the human performance in
discriminating between genuine accounts, social spambots, and traditional
spambots. Then, we benchmark several state-of-the-art techniques proposed by
the academic literature. Results show that neither Twitter, nor humans, nor
cutting-edge applications are currently capable of accurately detecting the new
social spambots. Our results call for new approaches capable of turning the
tide in the fight against this raising phenomenon. We conclude by reviewing the
latest literature on spambots detection and we highlight an emerging common
research trend based on the analysis of collective behaviors. Insights derived
from both our extensive experimental campaign and survey shed light on the most
promising directions of research and lay the foundations for the arms race
against the novel social spambots. Finally, to foster research on this novel
phenomenon, we make publicly available to the scientific community all the
datasets used in this study.Comment: To appear in Proc. 26th WWW, 2017, Companion Volume (Web Science
Track, Perth, Australia, 3-7 April, 2017
Fame for sale: efficient detection of fake Twitter followers
are those Twitter accounts specifically created to
inflate the number of followers of a target account. Fake followers are
dangerous for the social platform and beyond, since they may alter concepts
like popularity and influence in the Twittersphere - hence impacting on
economy, politics, and society. In this paper, we contribute along different
dimensions. First, we review some of the most relevant existing features and
rules (proposed by Academia and Media) for anomalous Twitter accounts
detection. Second, we create a baseline dataset of verified human and fake
follower accounts. Such baseline dataset is publicly available to the
scientific community. Then, we exploit the baseline dataset to train a set of
machine-learning classifiers built over the reviewed rules and features. Our
results show that most of the rules proposed by Media provide unsatisfactory
performance in revealing fake followers, while features proposed in the past by
Academia for spam detection provide good results. Building on the most
promising features, we revise the classifiers both in terms of reduction of
overfitting and cost for gathering the data needed to compute the features. The
final result is a novel classifier, general enough to thwart
overfitting, lightweight thanks to the usage of the less costly features, and
still able to correctly classify more than 95% of the accounts of the original
training set. We ultimately perform an information fusion-based sensitivity
analysis, to assess the global sensitivity of each of the features employed by
the classifier. The findings reported in this paper, other than being supported
by a thorough experimental methodology and interesting on their own, also pave
the way for further investigation on the novel issue of fake Twitter followers
DNA-inspired online behavioral modeling and its application to spambot detection
We propose a strikingly novel, simple, and effective approach to model online
user behavior: we extract and analyze digital DNA sequences from user online
actions and we use Twitter as a benchmark to test our proposal. We obtain an
incisive and compact DNA-inspired characterization of user actions. Then, we
apply standard DNA analysis techniques to discriminate between genuine and
spambot accounts on Twitter. An experimental campaign supports our proposal,
showing its effectiveness and viability. To the best of our knowledge, we are
the first ones to identify and adapt DNA-inspired techniques to online user
behavioral modeling. While Twitter spambot detection is a specific use case on
a specific social media, our proposed methodology is platform and technology
agnostic, hence paving the way for diverse behavioral characterization tasks
Signed Web Forms
As more and more Web applications are available on the Internet, they are becoming a standard way also for many organizations and institutions to offer their services and/or improve the efficiency of office procedures. Some of these applications require the user to input some information, typically by filling out a form, and submit the data. In many cases the user is required to digitally sign the data submitted. The problem of the digital signature has been solved with appropriate algorithms based on the use of two different keys: the private key and the public key. The private key must be known only to its legitimate owner, certified by a Certification Authority, and must be protected from unauthorized access. This problem has been solved by means of smart-cards and USB-tokens. However when the user decides to sign a document displayed on the screen, the software actually uses his private key to sign an internal representation of the document. Thus, another problem arises: the user must be sure that the document actually signed is the same document he has been shown. Since few years the WYSIWYS (What You See Is What You Sign) technology has been suggested, so that users know exactly what they sign. We propose an architecture based on this technology. The signing module is embedded in a Web Service that must be invoked to obtain the digital signature of a given document. This Web Service shows the document to the user that decides whether to sign it or not. Finally, we have tested this architecture by implementing a prototype of a Form-based Web application
Descrizione e gestione di workflow documentali con una appplicazione basata su XML
Abstract available in italian onlyI sistemi di workflow coordinano tutte le operazioni che riguardano l\u27elaborazione e la trasmissione dei documenti, specificando le attivit? ed i ruoli di tutti gli appartenenti al processo di lavoro. Un document workflow segue un documento durante tutto il suo ciclo di vita, fornendo un\u27azione di controllo costante per la sua compilazione. Nello studio presentato si cerca di far luce sulle varie problematiche che sorgono quando si descrivono iter documentali. A tale proposito viene definito un modello concettuale che permette di descrivere in maniera dettagliata un iter documentale e tutte le attivit? che si possono effettuare sul documento. Per sviluppare il modello si ? scelto di adottare la tecnologia XML, sia per strutturare i documenti che tutte le informazioni relative al flusso. Come agente si ? intesa una qualsiasi entit?, sia umana che software, capace di interagire con il documento, mentre con flusso di documenti si ? inteso tutti i possibili percorsi che il documento stesso segue nel suo ciclo di vita, passando da un agente all\u27altro. Il flusso documentale viene descritto tramite un linguaggio dichiarativo attraverso l\u27elencazione di tutti gli agenti che partecipano al flusso, specificando tutte le operazioni che ogni agente pu? svolgere sull\u27istanza del documento. I documenti elaborati dai vari agenti hanno una struttura definita da uno schema XML e sono accompagnati per tutto il loro ciclo di vita da altri documenti, che contengono informazioni sul flusso, sui vincoli e sulla visualizzazione dei dati. Una particolare enfasi ? data ai problemi relativi alla fusione di due o pi? documenti compilati da molteplici agenti in maniera concorrente. Per quanto concerne la progettazione di un sistema di gestione di workflow documentali, sono due le soluzioni architetturali analizzate: quella centralizzata e quella distribuita. Al fine di rappresentare graficamente i documenti da elaborare si utilizza il browser XSmiles, in grado di visualizzare documenti Xhtml con all\u27interno moduli XForms. Adoperando tecniche innovative, come XML-Signature, sono stati presi in esame tutti gli aspetti legati alla firma dei documenti modificati dagli agenti
- …